Explore CONUS404 Dataset
Contents
Explore CONUS404 Dataset#
This dataset was created by extracting specified variables from a collection of wrf2d output files, rechunking to better facilitate data extraction for a variety of use cases, and adding CF conventions to allow easier analysis, visualization and data extraction using Xarray and Holoviz.
import os
os.environ['USE_PYGEOS'] = '0'
import fsspec
import xarray as xr
import hvplot.xarray
import intake
import metpy
import cartopy.crs as ccrs
Open Dataset#
1) intake#
For this demonstration notebook, we will open a cloud-native dataset. The details
of its access are stored in an intake catalog.
cat = intake.open_catalog(
r"https://raw.githubusercontent.com/hytest-org/hytest/main/dataset_catalog/hytest_intake_catalog.yml"
)
## NOTE: we happen to know this dataset's handle/name.
dataset = 'conus404-hourly-cloud'
## If you did not know this name, you could list the datasets in the catalog with
## the command `list(cat)`
## But since we do know the name, let's see its metadata
cat[dataset]
conus404-hourly-cloud:
args:
consolidated: true
storage_options:
requester_pays: true
urlpath: s3://nhgf-development/conus404/conus404_hourly_202209.zarr
description: CONUS404 Hydro Variable subset, 40 years of hourly values on Cloud
driver: intake_xarray.xzarr.ZarrSource
metadata:
catalog_dir: https://raw.githubusercontent.com/hytest-org/hytest/main/dataset_catalog
NOTE This particular dataset has the requester_pays option set to true. This means
we have to set up our AWS credentials, else we won’t be able to load the data from S3
object storage
os.environ['AWS_PROFILE'] = 'default'
%run ../environment_set_up/Help_AWS_Credentials.ipynb
2) Start parallel cluster#
Some of the steps we will take are aware of parallel clustered compute environments
using dask. We’re going to start a cluster now so that future steps can take advantage
of this ability.
This is an optional step, but can speed up data loading significantly, especially when accessing data from the cloud.
%run ../environment_set_up/Start_Dask_Cluster_Nebari.ipynb
## If this notebook is not being run on Nebari/ESIP, replace the above
## path name with a helper appropriate to your compute environment. Examples:
# %run ../environment_set_up/Start_Dask_Cluster_Denali.ipynb
# %run ../environment_set_up/Start_Dask_Cluster_Tallgrass.ipynb
The 'cluster' object can be used to adjust cluster behavior. i.e. 'cluster.adapt(minimum=10)'
The 'client' object can be used to directly interact with the cluster. i.e. 'client.submit(func)'
The link to view the client dashboard is:
> https://nebari.esipfed.org/gateway/clusters/dev.1276907c8635401c9189ce15eab851f5/status
3) Explore and verify the dataset#
print(f"Reading {dataset} metadata...", end='')
ds = cat[dataset].to_dask().metpy.parse_cf()
print("done")
# Examine the grid data structure for SNOW:
ds.SNOW
Reading conus404-hourly-cloud metadata...
done
<xarray.DataArray 'SNOW' (time: 368064, y: 1015, x: 1367)>
dask.array<open_dataset-60f4cd5b1dab559310716eb4d524a8baSNOW, shape=(368064, 1015, 1367), dtype=float32, chunksize=(144, 175, 175), chunktype=numpy.ndarray>
Coordinates:
lat (y, x) float32 dask.array<chunksize=(175, 175), meta=np.ndarray>
lon (y, x) float32 dask.array<chunksize=(175, 175), meta=np.ndarray>
* time (time) datetime64[ns] 1979-10-01 ... 2021-09-25T23:00:00
* x (x) float64 -2.732e+06 -2.728e+06 ... 2.728e+06 2.732e+06
* y (y) float64 -2.028e+06 -2.024e+06 ... 2.024e+06 2.028e+06
metpy_crs object Projection: lambert_conformal_conic
Attributes:
description: SNOW WATER EQUIVALENT
grid_mapping: crs
long_name: Snow water equivalent
units: kg m-2Looks like this dataset is organized in three coordinates (x, y, and time). There is a
metpy_crs attached:
crs = ds['SNOW'].metpy.cartopy_crs
crs
<cartopy.crs.LambertConformal object at 0x7f59be65c070>
Use Case 1: Load the full domain at a specific time step#
%%time
da = ds.SNOW.sel(time='2014-03-01 00:00').load()
### NOTE: the `load()` is dask-aware, so will operate in parallel if
### a cluster has been started.
CPU times: user 757 ms, sys: 151 ms, total: 908 ms
Wall time: 2min 4s
da.hvplot.quadmesh(
x='lon',
y='lat',
rasterize=True,
geo=True,
tiles='OSM',
alpha=0.66,
cmap='plasma'
)
Use case 2: Load the full time series at a specific grid cell#
ds.PREC_ACC_NC
<xarray.DataArray 'PREC_ACC_NC' (time: 368064, y: 1015, x: 1367)>
dask.array<open_dataset-60f4cd5b1dab559310716eb4d524a8baPREC_ACC_NC, shape=(368064, 1015, 1367), dtype=float32, chunksize=(144, 175, 175), chunktype=numpy.ndarray>
Coordinates:
lat (y, x) float32 17.65 17.66 17.67 17.68 ... 51.73 51.71 51.69
lon (y, x) float32 -122.6 -122.5 -122.5 ... -57.17 -57.12 -57.07
* time (time) datetime64[ns] 1979-10-01 ... 2021-09-25T23:00:00
* x (x) float64 -2.732e+06 -2.728e+06 ... 2.728e+06 2.732e+06
* y (y) float64 -2.028e+06 -2.024e+06 ... 2.024e+06 2.028e+06
metpy_crs object Projection: lambert_conformal_conic
Attributes:
description: ACCUMULATED GRID SCALE PRECIPITATION OVER prec_acc_...
grid_mapping: crs
integration_length: accumulated over prior 60 minutes
long_name: Accumulated grid scale precipitation
units: mmSIDE NOTE
To identify a point, we will start with its lat/lon coordinates. But the
data is in Lambert Conformal Conic… need to re-project/transform using the
built-in crs we examined earlier:
lat,lon = 39.978322,-105.2772194
x, y = crs.transform_point(lon, lat, src_crs=ccrs.PlateCarree())
print(x,y) # these vals are in LCC
-618215.7570892666 121899.89692719541
%%time
da = ds.PREC_ACC_NC.sel(x=x, y=y, method='nearest').sel(time=slice('2013-01-01 00:00','2013-12-31 00:00')).load()
CPU times: user 157 ms, sys: 25.1 ms, total: 182 ms
Wall time: 3.34 s
da.hvplot(x='time', grid=True)
Stop cluster#
client.close(); cluster.shutdown()